Goto

Collaborating Authors

 Charles County




MetaSumPerceiver: Multimodal Multi-Document Evidence Summarization for Fact-Checking

Chen, Ting-Chih, Tang, Chia-Wei, Thomas, Chris

arXiv.org Artificial Intelligence

Fact-checking real-world claims often requires reviewing multiple multimodal documents to assess a claim's truthfulness, which is a highly laborious and time-consuming task. In this paper, we present a summarization model designed to generate claim-specific summaries useful for fact-checking from multimodal, multi-document datasets. The model takes inputs in the form of documents, images, and a claim, with the objective of assisting in fact-checking tasks. We introduce a dynamic perceiver-based model that can handle inputs from multiple modalities of arbitrary lengths. To train our model, we leverage a novel reinforcement learning-based entailment objective to generate summaries that provide evidence distinguishing between different truthfulness labels. To assess the efficacy of our approach, we conduct experiments on both an existing benchmark and a new dataset of multi-document claims that we contribute. Our approach outperforms the SOTA approach by 4.6% in the claim verification task on the MOCHEG dataset and demonstrates strong performance on our new Multi-News-Fact-Checking dataset.


D4: Improving LLM Pretraining via Document De-Duplication and Diversification

Tirumala, Kushal, Simig, Daniel, Aghajanyan, Armen, Morcos, Ari S.

arXiv.org Artificial Intelligence

Over recent years, an increasing amount of compute and data has been poured into training large language models (LLMs), usually by doing one-pass learning on as many tokens as possible randomly selected from large-scale web corpora. While training on ever-larger portions of the internet leads to consistent performance improvements, the size of these improvements diminishes with scale, and there has been little work exploring the effect of data selection on pre-training and downstream performance beyond simple de-duplication methods such as Min-Hash. Here, we show that careful data selection (on top of de-duplicated data) via pre-trained model embeddings can speed up training (20% efficiency gains) and improves average downstream accuracy on 16 NLP tasks (up to 2%) at the 6.7B model scale. Furthermore, we show that repeating data intelligently consistently outperforms baseline training (while repeating random data performs worse than baseline training). Our results indicate that clever data selection can significantly improve LLM pre-training, calls into question the common practice of training for a single epoch on as much data as possible, and demonstrates a path to keep improving our models past the limits of randomly sampling web data.


Schools deploy AI technology to protect against active shooters

FOX News

Fox News correspondent Matt Finn has the latest on the impact of AI technology that some say could outpace humans on'Special Report.' WASHINGTON – While most people look to artificial intelligence, or AI, for quick answers to complex problems, a growing number of school districts are turning to the technology to keep their students and staff safe. A school district in Charles County, Maryland, roughly an hour from Washington D.C., is in the process of installing software and hardware which would allow their current security cameras to detect a potential active shooter. "This artificial intelligence has the ability to be able to identify a weapon, to assess what's going on and how that person is acting," said Jason Stoddard, Director of School safety and Security for Charles County Public Schools. The district, through a state grant, is in the process of installing AI gun detection technology at all of its campuses.


Computer to call balls and strikes in minor league

FOX News

FILE - In this May 13, 2018, file photo, MLB umpire Joe West, right, talks with a player in the ninth inning during a baseball game between the Arizona Diamondbacks and the Washington Nationals in Phoenix. West, who has umpired more than 5,000 big league games, said the 2016 TrackMan computer system test was far from perfect. NEW YORK – Get ready for strikes by robots. Computers will be used for ball/strike calls starting April 25 in the independent Atlantic League, where the distance between home and first will be shortened by 3 inches. The ground between the mound and home plate will lengthen by 2 feet for the second half of the season beginning July 12.


Computer to call balls and strikes in minor league

#artificialintelligence

Get ready for strikes by robots. Computers will be used for ball/strike calls starting April 25 in the independent Atlantic League, where the distance between home and first will be shortened by 3 inches. The ground between the mound and home plate will lengthen by 2 feet for the second half of the season beginning July 12. The 60-foot-6-inch distance between the front of the pitching rubber and the back point of home plate has been standard since 1893, but Major League Baseball reached a three-year deal to experiment in the Atlantic League, an eight-team circuit that occasionally produces big leaguers. Infield defensive shifts will be limited.


Toward Metric Indexes for Incremental Insertion and Querying

Raff, Edward, Nicholas, Charles

arXiv.org Machine Learning

In this work we explore the use of metric index structures, which accelerate nearest neighbor queries, in the scenario where we need to interleave insertions and queries during deployment. This use-case is inspired by a real-life need in malware analysis triage, and is surprisingly understudied. Existing literature tends to either focus on only final query efficiency, often does not support incremental insertion, or does not support arbitrary distance metrics. We modify and improve three algorithms to support our scenario of incremental insertion and querying with arbitrary metrics, and evaluate them on multiple datasets and distance metrics while varying the value of $k$ for the desired number of nearest neighbors. In doing so we determine that our improved Vantage-Point tree of Minimum-Variance performs best for this scenario.


Is This How We Keep People from Starving Once the Robots Take Over?

#artificialintelligence

What are we going to do for all of the people displaced by robots? From truck drivers to (gasp!) writers, the growing number of professions impacted or even altogether eliminated by automation and artificial intelligence presents a significant potential economic and political challenge. Automation and artificial intelligence are a potential crisis, not an inevitable one. One idea is to find a way to guarantee a minimal standard of living, even in a world where, due to technology, the number of available jobs may be much smaller. One way is to implement a relatively old idea known as the guaranteed basic income.